Python: Improve prompt-template msg serialize and sample usage#13738
Python: Improve prompt-template msg serialize and sample usage#13738moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request improves Python prompt-template message serialization by using XML element serialization (instead of manual string concatenation) for Jinja2 and Handlebars message helpers, updates related samples to use serializer-backed helpers, and adds regression tests to ensure correct escaping and round-tripping of chat history with XML metacharacters.
Changes:
- Update Jinja2 and Handlebars
messagehelpers to build message XML viaElementTreeserialization to correctly escape XML metacharacters. - Align Azure chat prompt-template samples to use
message_to_promptfor serializer-backed message rendering. - Add unit and end to end tests covering escaping and chat history round-trip behavior, including system message preservation.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| python/semantic_kernel/prompt_template/utils/jinja2_system_helpers.py | Serialize message() output via XML element serialization to escape metacharacters. |
| python/semantic_kernel/prompt_template/utils/handlebars_system_helpers.py | Serialize message block helper output via XML element serialization to escape metacharacters. |
| python/tests/unit/prompt_template/test_jinja2_prompt_template.py | Add regression test validating escaping and round-trip parsing for message() helper. |
| python/tests/unit/prompt_template/test_jinja2_prompt_template_e2e.py | Add end to end tests for round-trip with metacharacters and system role preservation. |
| python/tests/unit/prompt_template/test_handlebars_prompt_template.py | Add regression test validating escaping and round-trip parsing for message block helper. |
| python/tests/unit/prompt_template/test_handlebars_prompt_template_e2e.py | Add end to end tests for round-trip with metacharacters and system role preservation. |
| python/samples/concepts/prompt_templates/azure_chat_gpt_api_jinja2.py | Update sample template to use message_to_prompt(item) for serializer-backed rendering. |
| python/samples/concepts/prompt_templates/azure_chat_gpt_api_handlebars.py | Update sample template to use {{message_to_prompt}} for serializer-backed rendering. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
python/tests/unit/prompt_template/test_jinja2_prompt_template_e2e.py
Outdated
Show resolved
Hide resolved
python/tests/unit/prompt_template/test_handlebars_prompt_template_e2e.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 89%
✓ Security Reliability
This PR is a security improvement that replaces manual f-string XML construction with xml.etree.ElementTree for generating XML tags in both Handlebars and Jinja2 prompt template helpers. Previously, user content containing XML metacharacters (<, >, &, ") was injected verbatim into XML strings, which could corrupt the XML structure and cause incorrect message reconstruction via ChatHistory.from_rendered_prompt(). The fix uses Element/tostring for proper escaping, consistent with how the rest of the codebase (ChatMessageContent.to_element, ChatHistory.from_rendered_prompt) already handles XML. The use of xml.etree.ElementTree for serialization-only (not parsing) is safe and matches the existing nosec B405 pattern in 12+ other files; defusedxml is correctly used elsewhere for parsing. The sample files are updated to demonstrate the message_to_prompt helper. Test coverage is thorough, including round-trip tests that verify ChatHistory.from_rendered_prompt correctly recovers the original messages after escaping.
✓ Test Coverage
This PR switches the handlebars and jinja2
_messagehelpers from manual XML string concatenation toxml.etree.ElementTree, properly escaping XML metacharacters in message content and attribute values. The test additions are well-structured: they verify specific escape sequences (<,&, unescaped"in text), and critically verify round-trip fidelity viaChatHistory.from_rendered_prompt. Both template engines (handlebars and jinja2) have symmetric unit and e2e tests. Minor gaps: no test coversNone/empty content in_messagehelpers (the jinja2 helper previously renderedNoneas the literal string "None" — now fixed but untested), and no test exercises attribute-value escaping in the handlebars_messageblock helper kwargs path. These are non-blocking.
✓ Design Approach
The PR correctly fixes an XML injection bug in both the Handlebars and Jinja2
_messagehelpers by switching from manual string concatenation toxml.etree.ElementTree.Element+tostring(), which properly escapes XML metacharacters in content. The approach is sound and the# nosec B405annotations are appropriate (we are generating XML, not parsing untrusted input). The sample changes steering users towardmessage_to_promptare a good idea since it usesto_prompt()/to_element()for full-fidelity serialization. However, the jinja2_message(item)helper still only serializesitem.content(the first TextContent text), silently dropping image items, function-call items, and any other non-text items from the message. Its sibling_message_to_promptavoids this by delegating toitem.to_prompt()/to_element(). Since_message(item)accepts a fullChatMessageContent, calers reasonably expect full serialization — yet the implementation silently produces an incomplete representation. The better approach for_messagein jinja2 would be to callitem.to_element()and usetostring()on the resulting element, exactly asto_prompt()does, rather than manually setting only the text field.
Suggestions
- The Jinja2
_message(item)helper setsmessage.text = item.content, which only captures the first TextContent's text and silently drops non-text items (images, function calls, etc.). Additionally,item.contentcould be a non-string type, andElementTree.tostringbehavior with non-strings is implementation-defined. The better approach is to delegate toitem.to_element()and calltostring()on that—the same way_message_to_promptworks viaitem.to_prompt()—which would avoid data loss for multi-modal messages and eliminate the type-safety concern. - Add a test for
None/empty content in the Jinja2_messagehelper—the old code renderedNoneas the literal string "None", and the new code correctly produces an empty element, but this fix is untested. A round-trip test with aChatMessageContentthat has no text items (e.g., a FunctionCallContent-only message) through{{ message(item) }}would lock this in. - Consider adding a test for attribute values containing XML metacharacters in the Handlebars
_messageblock helper (e.g.,{{#message role=role custom_attr=value}}...{{/message}}wherevaluecontains<or"). TheElement.set()call correctly escapes these, but there's no test verifying it. - The Handlebars sample (
azure_chat_gpt_api_handlebars.py) switched from{{#message role=role}}{{~content~}}{{/message}}to{{message_to_prompt}}, which changes the XML serialization format (from bare text to<text>child elements). While both formats round-trip correctly, this sample change is orthogonal to the XML-escaping fix and could be confusing to users who reference the old sample pattern. - The Handlebars
_messagehelper now setsmessage.text = str(options['fn'](this)), treating the entire block output as plain text. For users withallow_dangerously_set_content=Truewho previously relied on embedding raw XML sub-structure (e.g., image or function-call tags) directly in the block, this is a silent behavior change. It would be worth documenting that_messagenow always produces plain-text message content, and thatmessage_to_promptshould be used for structured/multi-modal content.
Automated review by moonbox3's agents
- Use public re-export for AuthorRole (from semantic_kernel.contents) in test_jinja2_prompt_template_e2e.py and test_handlebars_prompt_template_e2e.py - Replace manual XML construction in _message() with item.to_element() to properly serialize all content items (images, function calls, etc.) - Remove unused Enum and Element imports from jinja2_system_helpers.py - Update test assertion to match correct to_element() output format Co-authored-by: Copilot <[email protected]>
chetantoshniwal
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 87%
✓ Correctness
This PR fixes XML metacharacter escaping in both Handlebars and Jinja2 prompt template message helpers by switching from manual string concatenation to xml.etree.ElementTree for XML construction. The Handlebars _message helper now uses Element/tostring to properly escape content and attributes. The Jinja2 _message helper delegates to item.to_element()+tostring(), which changes its output format (content is now wrapped in child elements). Samples are updated to prefer message_to_prompt, and comprehensive tests verify round-trip correctness with XML metacharacters. The changes are logically correct and the tests properly validate the new escaping behavior.
✓ Security Reliability
This PR is a clear security improvement that fixes XML injection vulnerabilities in both the Handlebars and Jinja2 prompt template helpers. Previously, user-controlled content (message text, role values) was directly interpolated into XML strings via f-strings without escaping, allowing XML metacharacters like
<,&, and"to break the XML structure or inject malicious content. The fix correctly switches toxml.etree.ElementTree.Elementandtostring()for proper XML serialization with automatic escaping. The# nosec B405suppression is appropriate since ElementTree is used only for XML generation, not parsing (where XXE risks exist). The new tests thoroughly verify escaping and round-trip fidelity. No blocking security or reliability issues found.
✓ Test Coverage
The PR properly escapes XML metacharacters in both the Handlebars and Jinja2
_messagehelpers by switching toxml.etree.ElementTreefor XML generation. Test coverage is good overall: new unit tests and e2e tests verify escaping of<,&, and", plus round-trip fidelity viaChatHistory.from_rendered_prompt. However, there are a few gaps: (1) no test covers the empty-content path added in the Handlebars helper (if content: message.text = contentcombined withshort_empty_elements=False), (2) the Handlebars and Jinja2_messagehelpers now produce structurally different XML (flat text vs<text>sub-elements viato_element()), and there is no test that validates both round-trip correctly for the same input, and (3) the>character is not tested even though it is an XML metacharacter.
✗ Design Approach
The PR correctly addresses XML injection by switching from f-string XML concatenation to
xml.etree.ElementTree, which properly escapes metacharacters. However, it introduces a structural inconsistency between the two template-engine paths: the Jinja2_messagenow delegates toitem.to_element()(which wraps content in a<text>child element, producing<message role=".."><text>…</text></message>), while the Handlebars_messageblock helper still builds the element manually and setsmessage.text = content(producing the flat<message role="...">…</message>form). Becausefrom_rendered_promptmust handle both formats and the serialization protocol is the only contract between the renderer and the parser, having two divergent XML shapes for the same logical object is a fragile, leaky design. The right fix is to unify both paths throughitem.to_element()/ a shared serialisation helper, not to fix one engine while leaving the other inconsistent.
Automated review by chetantoshniwal's agents
python/semantic_kernel/prompt_template/utils/handlebars_system_helpers.py
Outdated
Show resolved
Hide resolved
python/semantic_kernel/prompt_template/utils/handlebars_system_helpers.py
Show resolved
Hide resolved
…format The handlebars _message helper was producing <message role="...">content</message> (direct text node) while the Jinja2 path produced <message role="..."><text>content</text></message> via to_element(). When the context is a ChatMessageContent, delegate to to_element() so the XML contract is consistent across template engines. Keep the manual Element construction as a fallback for non-ChatMessageContent contexts. Add tests for: - Consistent <text> wrapper in handlebars output - Empty message content (no self-closing tags) - Greater-than character handling and round-trip - Backward compatibility of from_rendered_prompt with old format Co-authored-by: Copilot <[email protected]>
…erage - Use <text> child element in handlebars fallback path instead of direct text node, matching the Jinja2/to_element() XML contract - Add > to XML metacharacter escape test - Add tests for fallback path (non-ChatMessageContent context) with both empty and non-empty block content - Add backward compatibility test verifying from_rendered_prompt parses the old format without <text> wrapper Co-authored-by: Copilot <[email protected]>
Motivation and Context
This PR updates the Jinja2 and Handlebars prompt-template helpers to serialize chat messages through the existing XML/message serializer instead of assembling message XML manually. It also aligns the prompt-template samples with the serializer-backed helper and adds regression coverage for common message content.
Description
Prompt template message serialize improvements.
Contribution Checklist